AN: An Interlingual A

نویسنده

  • Bonnie Dorr
چکیده

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed in part because interaction effects of complex phenomena made translation appear to be unmanageable. Later approaches to the problem have succeeded but are based on many language-specific rules. To capture all natural language phenomena, rulebased systems require an overwhelming number of rules; thus, such translation systems either have limited coverage, or poor performance due to formidable grammar size. This paper presents an implementation of an 5nterlingual” approach to natural language translation. The UNITRAN system relies on principle-based descriptions of grammar rather than rule-oriented descriptions. 2 The model is based on linguistically motivated principles and their associated parameters of variation. Because a few principles cover all languages, the unmanageable grammar size of alternative approaches is no longer a problem. The problem addressed in this paper is to construct a translation model that operates cross-linguistically without relying on complex language-specific rules. Many machine translation systems depend heavily on context-free rule-based systems. For example, the METAL system [Slocum, 19841, (Sl ocum and Bennett, 19851 is a transfer approach that relies on a large database of rules per language, solely for syntactic processing. The aim of this paper is to present the computational framework for UNITRAN, a syntactic translation system currently operating bidirectionally between Spanish and English, and to put into perspective how the design of the system differs from and compares to other translation designs. The distinction lThis report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for this work has been provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contracts NO001480-C-0505 and NO001485-K-0124, and also in part by NSF Grant DCR-85552543 under a Presidential Young Investigator’s Award to Professor Robert C. Berwick. 2The name UNITRAN stands for UNIversal TRANslator, that is, the system serves as the basis for translation across a variety of languages, not just two languages or a family of languages. 534 Natural Language Verb Prepoaing: tQu6 vio Juan? ‘What did John see?’ Null Subject: 1 Vio al hombre. The man that John saw that ate dinner left. ‘El hombre a quidn Juan vio que corni la cena sali6.’ Table 1: Sentences handled by UNITRAN between rule-based (non-interlingual) and principle-based (interlingual) systems will be presented, and the advantages of the principle-based design over other designs will be discussed. Finally, an overview of the UNITRAN design will be given, and a translation example will be shown. The model that has been constructed is based on abstract principles of the “Gove&rnent and Binding” (GB) [Chomsky, 19811 framework. The grammar is viewed as a modular system of principles rather than a large set of language-specific rules. Distinctions among languages are handled by settings of parameters associated with the principles. Several types of phenomena are handled without sacrificing cross-linguistic application (table 1 shows some examples). The system gives the user access to parameter settings, thus enabling additional languages to be handled. Interaction effects of the principles are handled by the system, not the user, thus eliminating the task of spelling out the details of rule applications. Before the source language processing (parsing) takes place, the parameters are set according to the source language values, and are then reset according to the target language values before target language processing (generation) occurs. For example, a “constituent order” parameter is associated with a universal principle that requires a language-dependent ordering of constituents with respect to a phrase. The user should SThe “{.. , . . }” notation denotes optionality. of the sentence may either be he or she. Thus, the subject From: AAAI-87 Proceedings. Copyright ©1987, AAAI (www.aaai.org). All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Strategies Used in the Translation of Interlingual Subtitling

This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...

متن کامل

On the Relationship between Intralingual/Interlingual Translation and Speaking Fluency of the Iranian Advanced EFL Learners

Speaking as an initial goal in language teaching and learning has relationships with many variables including listening, reading, writing, knowledge of vocabulary as well as grammar. The present study mainly aims at examining the relationship between translations and speaking fluency. For this purpose and following an experimental design, three groups of Iranian advanced EFL learners were...

متن کامل

From Bilingual Dictionaries to Interlingual Document Representations

Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary. We fi...

متن کامل

Spelling Errors of Iranian School-Level EFL Learners: Potential Sources

With the purpose of examining the sources of spelling errors of Iranian school level EFL learners, the present researchers analyzed the dictation samples of 51 Iranian senior and junior high school male and female students majoring at an Iranian school in Baku, Azerbaijan. The content analysis of the data revealed three main sources (intralingual, interlingual, and unique) with seven patterns o...

متن کامل

Interlingual Indexing across Different Languages

We present two methods for automatic indexing, which are based on an interlingual layer of content description. In the first approach, we acquire indexing patterns from English documents by statistically relating interlingual representations of English documents (based on text token bigrams) to their associated

متن کامل

Towards An Interlingual Treatment of Modality

Modality is an important, but complex linguistic phenomenon that concerns all levels of language production. NLP research has rather refrained from this subject, but we show that many errors in machine translation systems are directly related to the absence of a proper interlingual treatment of modality. We outline the traces of such a modal interlingua by presenting the “Module of Modality”, p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999